A Template-Based Approach to Summarize XML Collections

نویسندگان

  • Gudrun Fischer
  • Igor Jacy Lino Campista
چکیده

Existing summarization approaches for XML concentrate on extracting common structure and compressing the data, to optimize storage and speed up queries. Neither compression, nor structure extraction suffices for advanced, content-based summarization tasks. We present a set of tools for semi-automatic summarization of XML collections, where the user can specify semantically relevant features for an XML collection in a template, and define rules for summarization. The system assists the user in generating one or several such templates, selects applicable templates for a given collection, and applies them for automatic summarization. In experiments on the INEX collection (among others), we investigate the merits and limitations of our approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ToXgene: An extensible template-based data generator for XML

Synthetic collections of XML documents are useful in many applications, such as benchmarking (e.g., XMach-1, Xmark), and algorithm testing and evaluation. We present ToXgene a template-based generator for large, consistent collections of synthetic XML documents. Templates are annotated XML Schema specifications describing both the structure and the content of the data to be generated. Our tool ...

متن کامل

Using XML for flexible data entry in healthcare example use for pathology

This paper describes a pragmatic, generic and flexible approach for the management of XML structured data at the example of pathology reports. The flexibility of this approach is based on a template concept. The template describes the documents of a given (clinical) domain in terms of structure and user interface requirements. The template enables a so called document manager to provide a corre...

متن کامل

Code Generation via Xml/xslt Vs Cc-based Approaches Steps to Generate Source Code Compiler-compiler Based Xml / Xslt Based Tree Walking Process Input Language Syntax Trees Output Language Yntax Tree Sy S Figure 1 Code Generation Processing Phases

www.XML-JOURNAL.com august 2002 T ypically, the purpose of the software subsystem is to generate a concrete implementation from declarative models. This could be viewed as an extension of MVC (Model-View-Controller) architecture by incorporating a generator component (i.e., MVCG). Adopting a generative approach in software development is a goal cherished by many application developers. Why writ...

متن کامل

An XML-based Approach for the Presentation and Exploitation of Extracted Information

We present an approach for exploiting knowledge from documents in the web. It is based on the integration of XML technologies with robust tools for natural language processing. The overall goal is to offer a knowledge engineer as much support as possible for the task of extracting and formalizing knowledge from document collections.

متن کامل

A Clustered Index Approach to Distributed XPath

Supporting top-k queries over distributed collections of schemaless XML data poses two challenges. While XML supports expressive query languages such as XPath and XQuery, these languages require schema knowledge so as to write an appropriate query which may not be available in distributed systems with autonomous and dynamic sources. Thus, there is a need for approximate query processing. Furthe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005